--- title: Oh the Places You'll Go! author: Evan Canfield date: '2020-03-23' slug: data-viz-travel categories: [data viz] tags: ['R', 'ggplot'] subtitle: '' summary: Visualizing where I've traveled. authors: [] lastmod: '2020-03-23T14:39:26-04:00' featured: yes draft: false image: caption: '' focal_point: 'smart' preview_only: true projects: [] ---
Note: This blog post is based on an original post made for my DSBA 5122 (Visual Analytics) class at UNC Charlotte, in the Spring of 2019.
Ever since I could remember I have loved maps. As a kid I could get lost in an atlas or inspecting a globe. As an adult that love of maps has remained, I mean, there’s even a globe on the desk right next to me as I write this!
So with that as background, it might not sound SO crazy to learn that back in 2018, I made a concerted effort to document where on those maps I had actually stepped foot. I kept it to the United States, since I really haven’t traveled internationally that much. I started big picture: States! I figured out every state I had passed through in my life. This was pretty easy. I grew up in Connecticut, and mostly traveled around New England and the Northeast. The possible list was pretty short.
After quickly determining ever state I had ever crossed into, I decided to delve one layer deeper: Counties! This was a bit trickier, because you had to trace where you would have driven. What route did we take on family visits? Did we ever take the back roads one time? I checked off one county by figuring out I had been cliff jumping in a particular spot in Lake George in New York. The lake was in one county, but the cliffs we in another.
After a while pouring over county-level maps, I’m pretty sure I’ve cataloged at least 95% of all the counties I’ve ever been, maybe even 100%. Also, it turns out county collecting is actually a popular niche hobby. Who knew!
So, that brings us to the purpose of this post. What to do with all that data?! I plan on making this post a living document for the continued cataloging of states and counties. When I visit new states and counties, I’ll update the post. Perhaps in the future, this post will contain information on international travel. But more so, this post is a place to play around with ways to visualize this information.
The following R packages are used for this post.
if (!require(pacman)) {install.packages('pacman')}
p_load(grid, leaflet,stringr, tidyverse, tigris)
options(tigris_use_cache = TRUE)
In order to visually plot the information, I need shapefiles. In a previous project I generated my own set of shapefiles, modified from US Census Bureau files. Those files can be found at the following Github repo. In the future I may do a quick post on that process separately. These shapefiles are in standard dataframe form.
# State Shapefiles
sf_state <- readRDS(file = "data/us_state_map.RDS")
# County Shapefiles
sf_county <- readRDS(file = "data/us_county_map.RDS")
I have documented each state and county I have traveled through in a simple spreadsheet. I’ll import it here.
df <- read.csv("data/catalog_personal_travel.csv",
stringsAsFactors = FALSE)
When the CSV file for the personal travel data was imported, some columns that should be treated as stings get imported as integers. That means for these columns, any leading zeros are dropped. I will need to add them back, and converted the appropriate columns to characters.
# Functin For Adding Leading Zeros
pad_chr <- function(x, na.rm=FALSE) str_pad(string = x,
width = max(nchar(x)),
side = 'left', pad = 0)
df <- df %>%
# Convert state_code and county_code from int to chr
mutate_at(vars(contains('code')), as.character) %>%
mutate_at(vars(contains('code')), pad_chr)
Now, to generate a visual of the state’s I have visited, I will have to process the personal travel catalog data. That data is recorded at a county level. To use, I will need to create a state level version. I will need to perform a group_by action on the state.I will then sum the number of counties I have visited in each state. If the sum is greater than 1, I have visited that state.
Note: There are actually three columns which uniquely identify the state for the corresponding observation. I will group on state_code, which is the State FIPS code. This will make future actions simpler.
df_state <- df %>%
group_by(state_code) %>%
summarise(visited = sum(visited)) %>%
mutate(visited = if_else(visited >= 1, 1, 0))
Now that I have a data set identifying which states I have and have not been to, I can join that data with the state shapefile and plot it. This new file will be the input into *ggplot**.
sf_state_visit <- sf_state %>%
left_join(df_state, by = c("state_fips" = "state_code"))
Before plotting the map, I’m going to define some global ggplot theme values to make it easier to apply the same conditions to multiple plots.
# Style Settings
col_pal <- c("#F4F6F6", "#1F618D")
col_edge <- "gray90"
alpha = 0.75
size = 0.005
# Theme Settings
theme_choropleth <- function(){
theme_void() +
theme(
legend.position = "none"
)
}
With the joined file, I now can plot the data and create a state level choropleth. The blue states are states I have visited.
p_state <- ggplot()
p_state_1 <- p_state + geom_polygon(
data = sf_state_visit,
mapping = aes(x = long, y = lat, group = group, fill = factor(visited)),
color = col_edge,
alpha = alpha,
size = size) +
coord_equal() +
scale_fill_manual(values = col_pal) +
theme_choropleth()
p_state_1

With the state map complete, I’ll create a county level map. The process is essentially the same, except I do not need to process the original data from county to state level.
First, I join the shapefile with the visit data.
sf_county_visit <- sf_county %>%
left_join(df, by = c("state_fips" = "state_code",
"county_fips" = "county_code"))
Then I plot. To the state boundaries clear, I overlay the states lines with the county lines. I’ll create a function to generate the plot because I plan on generating similar plots with the same core code.
plot_county <- function(df_state, df_county){
# County Choropleth
p_county <- ggplot()
p_county_1 <- p_county + geom_polygon(
data = df_county,
mapping = aes(x = long, y = lat, group = group, fill = factor(visited)),
color = col_edge,
alpha = alpha,
size = size) +
coord_equal() +
scale_fill_manual(values = col_pal) +
theme_choropleth()
# State Overlay
P_state_overlay <- geom_polygon(
data = df_state,
mapping = aes(x = long, y = lat, group = group),
color = col_edge,
alpha = 0,
size = 1)
# Combine
p_county_1 + P_state_overlay
}
plot_county(sf_state_visit, sf_county_visit)

Out of curiosity I want to inspect New England and the Carolinas as two separate plots. These are the areas of the country where I have lived the longest, so I expect to see the most coverage here.
I’ll need to filter the county and state data to isolate the six New England states.
# State FIPS codes
new_england_fips = c("09", "23", "25", "33", "44", "50")
sf_county_visit_ne <- sf_county_visit %>%
filter(state_fips %in% new_england_fips)
sf_state_visit_ne <- sf_state_visit %>%
filter(state_fips %in% new_england_fips)
p_ne <- plot_county(sf_state_visit_ne, sf_county_visit_ne)
print(p_ne, vp=viewport(angle=-17.5))

Just like in New England, I filter so only North and South Carolina are in the data.
# State FIPs codes
car_fips = c("37", "45")
sf_county_visit_nc <- sf_county_visit %>%
filter(state_fips %in% car_fips)
sf_state_visit_nc <- sf_state_visit %>%
filter(state_fips %in% car_fips)
Then I plot. I use the grid package to rotate the output ggplot figure to better view North Carolina.
p_car <- plot_county(sf_state_visit_nc, sf_county_visit_nc)
print(p_car, vp=viewport(angle=-13.5))

For now I’m going to try one last thing. I’m going to plot this data in leaflet, an interactive htmlwidget. Unfortunately, I can’t use the shapefile data I was using in previous sections. That data was manipulated to put Alaska and Hawaii in an inset. As as a result, this data won’t plot onto an actual map.Therefore, I need to create a new shapefile for both state and county data.
I will use the US Census Bureau shapefiles, via the Tigris package. The Tigris package provides shapefiles as a SpatialPolygonsDataFrame type file. This data-type is accepted by Leaflet, so I don’t need to convert to a standard dataframe.
#State
spdf_state <- states(cb = TRUE)
#County
spdf_county <- counties(cb = TRUE)
Just like with the ggplot visuals, I need to join the personal travel catalog data to the shapefile data. In this case, the shapefile remains a SpatialPolygonsDataFrame file. This means a standard dplyr join won’t work. I need to use geo_join from the Tigris package.
Before I make the join, I need to create a variable to join on. The shapefile has a variable called GEOID. This is the five digit county/state FIPS code. IT turns out my data set has the county and state FIPS codes, only as seperate columns. So I will combine these codes to create a variable to join with GEOID.
df <- df %>%
mutate(FIPS = str_c(state_code, county_code))
Now, both the shapefile and the travel data share a variable I can join on.
spdf_county_visited <- geo_join(spatial_data = spdf_county,
data_frame = df,
by_sp = "GEOID",
by_df = "FIPS")
Just like with the ggplot visuals, I need to define some general visual settings.
col_pal_leaflet <- colorNumeric(palette = "viridis",
domain = spdf_county_visited$visited)
With everything all set, I can now generate the leaflet interactive visual.
leaflet(options = leafletOptions(minZoom = 3)) %>%
addTiles() %>%
addPolygons(data = spdf_state,
color = "#2E4053",
weight = 1,
opacity = 1,
dashArray = 1.5,
fillColor = "white",
fillOpacity = 0
) %>%
addPolygons(data = spdf_county_visited,
highlight = highlightOptions(color = "white"),
fillColor = ~col_pal_leaflet(spdf_county_visited$visited),
weight = 0.5,
opacity = 1,
color = "white",
dashArray = 2.5,
fillOpacity = 0.25,
label = ~paste(NAME)
) %>%
setView(lng = -96,
lat = 37.8,
zoom = 4) %>%
setMaxBounds(lng1 = -180,
lng2 = -60,
lat1 = 73,
lat2 = 15)